摘要 :
A problem on multicore systems is cache sharing, where the cache occupancy of a program depends on the cache usage of peer programs. Exclusive cache hierarchy as used on AMD processors is an effective solution to allow processor c...
展开
A problem on multicore systems is cache sharing, where the cache occupancy of a program depends on the cache usage of peer programs. Exclusive cache hierarchy as used on AMD processors is an effective solution to allow processor cores to have a large private cache while still benefitting from shared cache. The shared cache stores the "victims" (i.e., data evicted from private caches). The performance depends on how victims of co-run programs interact in shared cache.
收起
摘要 :
As CMP platforms are widely adopted, more and more cores are integrated on to the die. To reduce the off-chip memory access, the last level cache is usually organized as a distributed shared cache. In order to avoid hot-spots, cac...
展开
As CMP platforms are widely adopted, more and more cores are integrated on to the die. To reduce the off-chip memory access, the last level cache is usually organized as a distributed shared cache. In order to avoid hot-spots, cache lines are interleaved across the distributed shared cache slices using a hash function. However, as we increase the number of cores and cache slices in the platform, this also implies that most of data references go to remote cache slices, thereby increasing the access latency significantly. In this paper, we propose a hybrid last level cache, which has some amount of private space and some amount of shared space on each cache slice. For workloads with no sharing, the goal is to provide more hits into the local slice while still keeping the overall miss rate low. For workloads with sufficient sharing, the goal is to allow more sharing in the last-level cache slice. We present hybrid last-level cache design options and study its hit/miss rate behavior for a number of important server applications and multi-programmed workloads. Our simulation results on running multi-programmed workloads based on SPEC CINT2000 as well as multithreaded workloads based on commercial server benchmarks (TPCC, SPECjbb, SAP and TPCE) show that this architecture is advantageous especially since it can improve the local hit rate significantly while keeping the overall miss rate similar to the shared cache.
收起
摘要 :
A new Web cache sharing scheme is presented, Our method reduces the duplicated copies of the same objects in global shared Web caches, even though the hot working set of each local cache can be duplicated. Experimental results sho...
展开
A new Web cache sharing scheme is presented, Our method reduces the duplicated copies of the same objects in global shared Web caches, even though the hot working set of each local cache can be duplicated. Experimental results show that the proposed scheme outperforms existing sharing schemes.
收起
摘要 :
When several applications are co-scheduled to run on a system with multiple shared LLCs, there is opportunity to improve system performance. This opportunity can be exploited by the hardware, software, or a combination of both har...
展开
When several applications are co-scheduled to run on a system with multiple shared LLCs, there is opportunity to improve system performance. This opportunity can be exploited by the hardware, software, or a combination of both hardware and software. The software, i.e., an operating system or hypervisor, can improve system performance by co-scheduling jobs on LLCs to minimize shared cache contention. The hardware can improve system throughput through better replacement policies by allocating more cache resources to applications that benefit from the cache and less to those applications that do not. This study presents a detailed analysis on the interactions between intelligent scheduling and smart cache replacement policies. We find that smart cache replacement reduces the burden on software to provide intelligent scheduling decisions. However, under smart cache replacement, there is still room to improve performance from better application co-scheduling. We find that co-scheduling decisions are a function of the underlying LLC replacement policy. We propose Cache Replacement and Utility-aware Scheduling (CRUISE)-a hardware/software co-designed approach for shared cache management. For 4-core and 8-core CMPs, we find that CRUISE approaches the performance of an ideal job co-scheduling policy under different LLC replacement policies.
收起
摘要 :
When several applications are co-scheduled to run on a system with multiple shared LLCs, there is opportunity to improve system performance. This opportunity can be exploited by the hardware, software, or a combination of both har...
展开
When several applications are co-scheduled to run on a system with multiple shared LLCs, there is opportunity to improve system performance. This opportunity can be exploited by the hardware, software, or a combination of both hardware and software. The software, i.e., an operating system or hypervisor, can improve system performance by co-scheduling jobs on LLCs to minimize shared cache contention. The hardware can improve system throughput through better replacement policies by allocating more cache resources to applications that benefit from the cache and less to those applications that do not.
收起
摘要 :
A new Web cache sharing scheme is presented. Our scheme reduces the duplicated copies of the same objects in global shared Web caches. It also reduces the message overhead of existing schemes significantly. Trace-driven simulation...
展开
A new Web cache sharing scheme is presented. Our scheme reduces the duplicated copies of the same objects in global shared Web caches. It also reduces the message overhead of existing schemes significantly. Trace-driven simulations with actual Web cache logs show that the proposed scheme performs better than the two well-known Web cache sharing schemes, the Internet Cache Protocol and the Cache Array Routing Protocol.
收起
摘要 :
This paper presents a new shared cache technique - the grouping cache, which can solve many invalid queries in the broadcast probe and the control bottleneck of the centralized web cache by dividing all cooperative caches into sev...
展开
This paper presents a new shared cache technique - the grouping cache, which can solve many invalid queries in the broadcast probe and the control bottleneck of the centralized web cache by dividing all cooperative caches into several groups according to their positions in the network topology. The technique has the following characteristics: The overhead of multi-cache queiy can be reduced efficiently by the cache grouping scheme; the compact summary of the cache directory can rapidly determine if a request exists in a cache within the group; the distribution algorithm based on the web-access logs can effectively balance the load among all the groups. The simulation test demonstrates that the grouping cache is more effective than any other existing shared cache techniques.
收起
摘要 :
This paper proposes dynamic cache partitioning amongst simultaneously executing processes/ threads. We present a general partitioning scheme that can be applied to set-associative caches. Since memory reference characteristics of ...
展开
This paper proposes dynamic cache partitioning amongst simultaneously executing processes/ threads. We present a general partitioning scheme that can be applied to set-associative caches. Since memory reference characteristics of processes/threads can change over time, our method collects the cache miss characteristics of processes/threads at run-time. Also, the workload is determined at runtime by the operating system scheduler. Our scheme combines the information, and partitions the cache amongst the executing processes/threads. Partition sizes are varied dynamically to reduce the total number of misses. The partitioning scheme has been evaluated using a processor simulator modeling a two-processor CMP system. The results show that the scheme can improve the total IPC significantly over the standard least recently used (LRU) replacement policy. In a certain case, partitioning doubles the total IPC over standard LRU. Our results show that smart cache management and scheduling is essential to achieve high performance with shared cache memory.
收起
摘要 :
We consider a centralized caching network, where a server serves several groups of users, each having a common shared homogeneous fixed-size cache and requesting arbitrary multiple files. An existing coded prefetching scheme is em...
展开
We consider a centralized caching network, where a server serves several groups of users, each having a common shared homogeneous fixed-size cache and requesting arbitrary multiple files. An existing coded prefetching scheme is employed where each file is broken into multiple fragments and each cache stores multiple coded packets, each formed by XORing fragments from different files. For such a system, we propose an efficient file delivery scheme by the server to meet the arbitrary multi-requests of all user-groups. Specifically, the stored coded packets of each cache are classified into four types based on the composition of the file fragments encoded. A delivery strategy is developed, which separately delivers a part of each packet type first, and then combinatorially delivers the remaining different packet types in the last stage. The rate, as well as the worst rate of the proposed delivery scheme, are analyzed. We show that our caching model and delivery scheme can incorporate some existing coded caching schemes as special cases. Moreover, for the special case of uniform requests and uncoded prefetching, we make a comparison with existing results, and show that our approach can achieve a lower delivery rate. We also provide numerical results on the delivery rate for the proposed scheme.
收起
摘要 :
In a mobile ad hoc network (MANET) under normal cache sharing scenarios, when the data is transmitted from source to destination, all the nodes along the path store the information on the cache layer before reaching the destinatio...
展开
In a mobile ad hoc network (MANET) under normal cache sharing scenarios, when the data is transmitted from source to destination, all the nodes along the path store the information on the cache layer before reaching the destination. This may result in increased node overhead, increased cache memory utility and very high end-to-end delay. In this paper, we propose an enhanced cache sharing through cooperative data cache (ECSCDC) approach for MANETs. During the transmission of desired data from the data centre back to the request originator, the data packets will be cached by the intermediate caching nodes only if required, by using the asymmetric cooperative cache approach. Those caching nodes that can retain the data in its cache, for future data retrieval is selected based on scaled power community index. By simulation results, we show that the proposed technique reduces the communication overhead, access latency and average traffic ratio near the data centre while increasing the cache hit ratio.
收起